“ Making effective use of metadata of historical texts and corpora ” 7 - 8 September 2017
نویسندگان
چکیده
tba Peter Fankhauser (Institut für Deutsche Sprache, Mannheim), Visual correlation for exploring paradigmatic language change Abstract: Paradigmatic language change occurs when paradigmatically related words with similar usage context rise or fall together. We introduce an approach to explore such paradigmatic change in diachronic corpora by visually correlating two factors: Frequency change and distributional semantics of words. Frequency change is visualized by means of color derived from the slope of a logistic growth curve fitted to the frequency trend. Semantics of words is visualized by positioning them in two dimensions such that words with similar usage contexts are positioned closely together. As a result we get islands of paradigmatically related words with similar color that can act as a guide for exploring language change. Paradigmatic language change occurs when paradigmatically related words with similar usage context rise or fall together. We introduce an approach to explore such paradigmatic change in diachronic corpora by visually correlating two factors: Frequency change and distributional semantics of words. Frequency change is visualized by means of color derived from the slope of a logistic growth curve fitted to the frequency trend. Semantics of words is visualized by positioning them in two dimensions such that words with similar usage contexts are positioned closely together. As a result we get islands of paradigmatically related words with similar color that can act as a guide for exploring language change. Lousianne Ferlier (Royal Society, London), The Royal Society Journal Collection: unlocking 300 years of scientific periodicals
منابع مشابه
Metadiscourse Use in Popular and Professional Science: The Case of Hedges and Boosters
The present article shows that all scientific texts included in journals, magazines, and newspapers are vulnerable to the penetration of hedges and boosters. However, it was found that scientific texts in the three corpora tended to open up the possibilities of alternative voices rather than narrowing them down. The relatively higher frequency of occurrence of hedges in comparison with booster...
متن کاملCompiling and Processing Historical and Contemporary Portuguese Corpora
[email protected] University of Cologne, Albertus-Magnus Platz, 50923 Cologne, Germany Abstract This technical report describes the framework used for processing three large Portuguese corpora. Two corpora contain texts from newspapers, one published in Brazil and the other published in Portugal. The third corpus is Colonia, a historical Portuguese collection containing texts written...
متن کاملSyntactic Complexity of Russian Unified State Exam Texts in English: A Study on Reliability and Validity
In this study we analyze texts used in Russian Unified State Exam on English language. Texts that formed small research corpora were retrieved from 2 resources: official USE database as a reference point, and popular website used by pupils for USE training “Neznaika” (https://neznaika.pro/). The size of two corpora is balanced: USE has 11934 tokens and “Neznaika” - 11918 tokens. We share Biber’...
متن کاملComparative Study of the Academic Vocabulary Content of Electronic Engi-neering Corpora, GE Materials and M.S. Entrance Examinations
The importance of vocabulary learning has been underlined in the field of English for Academic Purposes (EAP) because non-English majors who require reading English texts in their fields of study have to expand their English vocabulary knowledge much more efficiently than ordinary ESL/EFL learners. Since academic vocabulary instruction in Iranian universities is realized through the use of Gene...
متن کاملBuilding a Corpus-based Historical Portuguese Dictionary: Challenges and Opportunities
Historical corpora are important resources for different areas. Philology, Human Language Technology, Literary Studies, History, and Lexicography are some that benefit from them. However, compiling historical corpora is different from compiling contemporary corpora. Corpus designers have to deal with several characteristics inherent in historical texts, such as: absence of a spelling standard, ...
متن کامل